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Abstract 


DDoS defense today relies on expensive and propri¬ 
etary hardware appliances deployed at fixed locations. 
This introduces key limitations with respect to flexibil¬ 
ity (e.g., complex routing to get traffic to these “choke- 
points”) and elasticity in handling changing attack pat¬ 
terns. We observe an opportunity to address these limita¬ 
tions using new networking paradigms such as software- 
defined networking (SDN) and network functions virtu¬ 
alization (NFV). Based on this observation, we design 
and implement Bohatei, flexible and elastic DDoS de¬ 
fense system. In designing Bohatei, we address key 
challenges with respect to scalability, responsiveness, 
and adversary-resilience. We have implemented de¬ 
fenses for several DDoS attacks using Bohatei. Our 
evaluations show that Bohatei is scalable (handling 500 
Gbps attacks), responsive (mitigating attacks within one 
minute), and resilient to dynamic adversaries. 

1 Introduction 


In spite of extensive industrial and academic efforts 
(e.g., ||3 41 ^), distributed denial-of-service (DDoS) 
attacks continue to plague the Internet. Over the last 
few years, we have observed a dramatic escalation 
in the number, scale, and diversity of DDoS attacks. 
For instance, recent estimates suggest that over 20,000 
DDoS attacks occur per day 0, with peak volumes 
of 0.5 Tbps 114 At the same time, new vec¬ 
tors and variations of known attacks | |49l are 

constantly emerging. The damage that these DDoS at¬ 
tacks cause to organizations is well-known and include 
both monetary losses (e.g., $40,000 per hour |T^ ) and 
loss of customer trust. 

DDoS defense today is implemented using expensive 
and proprietary hardware appliances (deployed in-house 
or in the cloud |[^[T^) that ai&flxed in terms of place¬ 
ment, functionality, and capacity. First, they are typi¬ 
cally deployed at fixed network aggregation points (e.g., 
a peering edge link of an ISP). Second, they provide 


fixed functionality with respect to the types of DDoS at¬ 
tacks they can handle. Third, they have a fixed capacity 
with respect to the maximum volume of traffic they can 
process. This fixed nature of today’s approach leaves 
network operators with two unpleasant options: (1) to 
overprovision by deploying defense appliances that can 
handle a high (but pre-deflned) volume of every known 
attack type at each of the aggregation points, or (2) to 
deploy a smaller number of defense appliances at a cen¬ 
tral location (e.g., a scrubbing center) and reroute traf¬ 
fic to this location. While option (2) might be more 
cost-effective, it raises two other challenges. First, op¬ 
erators run the risk of underprovisioning. Second, traf¬ 
fic needs to be explicitly routed through a fixed central 
location, which introduces additional traffic latency and 
requires complex routing hacks (e.g., 0). Either way, 
handling larger volumes or new types of attacks typically 
mandates purchasing and deploying new hardware appli¬ 
ances. 

Ideally, a DDoS defense architecture should provide 
the flexibility to seamlessly place defense mechanisms 
where they are needed and the elasticity to launch de¬ 
fenses as needed depending on the type and scale of the 
attack. We observe that similar problems in other ar¬ 
eas of network management have been tackled by tak¬ 
ing advantage of two new paradigms: software-defined 
networking (SDN) J0|4g and network functions vir¬ 
tualization (NFV) m SDN simplifies routing by de¬ 
coupling the control plane (i.e., routing policy) from the 
data plane (i.e., switches). In parallel, the use of virtual¬ 
ized network functions via NFV reduces cost and enables 
elastic scaling and reduced time-to-deploy akin to cloud 
computing | |43| . These potential benefits have led major 
industry pliers ( e.g., Verizon, AT&T) to embrace SDN 
and NFV 

In this paper, we present BohateQ a flexible and 


^To quote the SEVP of AT&T: “To say that we are both feet in [on 
SDN] would be an understatement. We are literally all in j^.” 

^It means breakwater in Japanese, used to defend against tsunamis. 














elastic DDoS defense system that demonstrates the ben¬ 
efits of these new network management paradigms in the 
context of DDoS defense. Bohatei leverages NFV ca¬ 
pabilities to elastically vary the required scale (e.g., 10 
Gbps vs. 100 Gbps attacks) and type (e.g., SYN proxy 
vs. DNS reflector defense) of DDoS defense realized by 
defense virtual machines (VMs). Using the flexibility 
of SDN, Bohatei steers suspicious traffic through the de¬ 
fense VMs while minimizing user-perceived latency and 
network congestion. 

In designing Bohatei, we address three key algorith¬ 
mic and system design challenges. First, the resource 
management problem to determine the number and loca¬ 
tion of defense VMs is NP-hard and takes hours to solve. 
Second, existing SDN solutions are fundamentally un¬ 
suitable for DDoS defense (and even introduce new at¬ 
tack avenues) because they rely on a per-flow orchestra¬ 
tion paradigm, where switches need to contact a network 
controller each time they receive a new flow. Finally, 
an intelligent DDoS adversary can attempt to evade an 
elastic defense, or alternatively induce provisioning inef- 
hciencies by dynamically changing attack patterns. 

We have implemented a Bohatei controller using 
OpenDaylight HD’ an industry-grade SDN platform. 
We have used a combination of open source tools (e.g., 
OpenvSwitch fThl , Snort p8) , Bro |46|, iptables fT?) ) as 
defense modules. We have developed a scalable resource 
management algorithm. Our evaluation, performed on a 
real testbed as well as using simulations, shows that Bo¬ 
hatei effectively defends against several different DDoS 
attack types, scales to scenarios involving 500 Gbps at¬ 
tacks and ISPs with about 200 backbone routers, and can 
effectively cope with dynamic adversaries. 


Contributions and roadmap: In summary, this paper 
makes the following contributions: 

• Identifying new opportunities via SDN/NFV to im¬ 
prove the current DDoS defense practice (Q; 

• Highlighting the challenges of applying existing 
SDN/NFV techniques in the context of DDoS de- 
fense((|^; 

• Designing a responsive resource management algo¬ 
rithm that is 4-5 orders of magnitude faster than the 
state-of-the-art solvers (fQ; 

• Engineering a practical and scalable network or¬ 
chestration mechanism using proactive tag-based for¬ 
warding that avoids the pitfalls of existing SDN so¬ 
lutions (fQ; 

• An adaptation strategy to handle dynamic adversaries 
that can change the DDoS attack mix over time ((|^; 

• A proof-of-concept implementation to handle several 
known DDoS attack types using industry-grade SD¬ 
N/NFV platforms ((Q; and 


• A systematic demonstration of the scalability and ef¬ 
fectiveness of Bohatei ((j^. 

We discuss related work (|^ before concluding ((|T0|. 

2 Background and Motivation 

In this section, we give a brief overview of software- 
dehned networking (SDN) and network functions virtu¬ 
alization (NFV) and discuss new opportunities these can 
enable in the context of DDoS defense. 

2.1 New network management trends 


Software-defined networking (SDN): Traditionally, 
network control tasks (e.g., routing, traffic engineering, 
and access control) have been tightly coupled with their 
data plane implementations (e.g., distributed routing pro¬ 
tocols, ad hoc ACEs). This practice has made net¬ 
work management complex, brittle, and error-prone . 
SDN simplifies network management by decoupling the 
network control plane (e.g., an intended routing policy) 
from the network data plane (e.g., packet forwarding 
by individual switches). Using SDN, a network opera¬ 
tor can centrally program the network behavior through 
APIs such as OpenElow | |40) . This flexibility has mo¬ 
tivated several real world deployments to transition to 
SDN-based architectures (e.g., 0). 


Network functions virtualization (NFV): Today, net¬ 
work functions (e.g., firewalls, IDSes) are implemented 
using specialized hardware. While this practice was nec¬ 
essary for performance reasons, it leads to high cost and 
inflexibility. These limitations have motivated the use 
of virtual network functions (e.g., a virtual firewall) on 
general-purpose servers |43|. Similar to traditional vir¬ 
tualization, NEV reduces costs and enables new opportu¬ 
nities (e.g., elastic scaling). Indeed, leading vendors al¬ 
ready offer virtual appliance products (e.g., 124 )). Given 
these benefits, major ISPs have deployed (or are planning 
to deploy) datacenters to run virtualized functions that re¬ 
place existing specialized hardware 1^[T^|^. One po¬ 
tential concern with NEV is low packet processing per¬ 
formance. Eortunately, several recent advances enable 
line-rate (e.g., 10-40Gbps) packet processing by soft¬ 
ware running on commodity hardware 6Zl- Thus, such 
performance concerns are increasingly a non-issue and 
will further diminish given constantly improving hard¬ 
ware support ID. 


2.2 New opportunities in DDoS defense 


Next, we briefly highlight new opportunities that SDN 
and NEV can enable for DDoS defense. 


Lower capital costs: Current DDoS defense is based 
on specialized hardware appliances (e.g., 01^). Net¬ 
work operators either deploy them on-premises, or out¬ 
source DDoS defense to a remote packet scrubbing site 
(e.g., 1^). In either case, DDoS defense is expensive. 








For instance, based on public estimates from the Gen¬ 
eral Services Administration (GSA) Schedule, a 10 Gbps 
DDoS defense appliance costs «$128,000 GD- To put 
this in context, a commodity server with a 10 Gbps Net¬ 
work Interface Card (NIC) costs about $3,000 fH)) . This 
suggests roughly 1-2 orders of magnitude potential re¬ 
duction in capital expenses (ignoring software and de¬ 
velopment costs) by moving from specialized appliances 
to commodity hardware]^ 


Time to market: As new and larger attacks emerge, 
enterprises today need to frequently purchase more ca¬ 
pable hardware appliances and integrate them into the 
network infrastructure. This is an expensive and tedious 
process 1431. In contrast, launching a VM customized for 
a new type of attack, or launching more VMs to handle 
larger-scale attacks, is trivial using SDN and NFV. 


Elasticity with respect to attack volume: Today, 
DDoS defense appliances deployed at network choke- 
points need to be provisioned to handle a predefined 
maximum attack volume. As an illustrative example, 
consider an enterprise network where a DDoS scrubber 
appliance is deployed at each ingress point. Suppose the 
projected resource footprint (i.e., defense resource us¬ 
age over time) to defend against a SYN flood attack at 
times fi, f 2 , and t$ is 40, 80, and 10 Gbps, respectively]^ 
The total resource footprint over this entire time period 
is 3 X max{4Q, 80,10} = 240 Gbps, as we need to provi¬ 
sion for the worst case. However, if we could elastically 
scale the defense capacity, we would only introduce a re¬ 
source footprint of 40-1-80+ 10 = 130 Gbps—a 45% re¬ 
duction in defense resource footprint. This reduced hard¬ 
ware footprint can yield energy savings and allow ISPs to 
repurpose the hardware for other services. 


Flexibility with respect to attack types: Building on 
the above example, suppose in addition to the SYN flood 
attack, the projected resource footprint for a DNS ampli¬ 
fication attack in time intervals fi, ti, and fa is 20, 40, 
and 80 Gbps, respectively. Launching only the required 
types of defense VMs as opposed to using monolithic 
appliances (which handle both attacks), drops the hard¬ 
ware footprint by 40%; i.e., from 3 x (max{40, 80,10} + 
max{20,40,80}) = 480 to 270. 

FlexibiUty with respect to vendors: Today, network 
operators are locked-in to the defense capabilities offered 
by specific vendors. In contrast, with SDN and NFV, 
they can launch appropriate best-of-breed defenses. For 
example, suppose vendor 1 is better for SYN flood de¬ 
fense, but vendor 2 is better for DNS flood defense. The 
physical constraints today may force an ISP to pick only 


^ Operational expenses are harder to compare due to the lack of 
publicly available data. 

^For brevity, we use the traffic volume as a proxy for the memory 
consumption and CPU cycles required to handle the traffic. 



Figure 1: DDoS defense routing efficiency enabled by 
SDN and NFV. 

one hardware appliance. With SDN/NFV we can avoid 
the undesirable situation of picking only one vendor and 
rather have a deployment with both types of VMs each 
for a certain type of attack. Looking even further, we 
also envision that network operators can mix and match 
capabilities from different vendors; e.g., if vendor 1 has 
better detection capabilities but vendor 2’s blocking al¬ 
gorithm is more effective, then we can flexibly combine 
these two to create a more powerful defense platform. 

Simplified and efficient routing: Network operators 
today need to employ complex routing hacks (e.g., 0) 
to steer traffic through a fixed-location DDoS hardware 
appliance (deployed either on-premises or in a remote 
site). As Figure [T] illustrates, this causes additional la¬ 
tency. Consider two end-to-end flows flow\ and flow 2 - 
Way-pointing flowi through the appliance (the left hand 
side of the figure) makes the total path lengths 3 hops. 
But if we could launch VMs where they are needed (the 
right hand side of the figure), we could drop the total 
path lengths to 2 hops—a 33% decrease in traffic foot¬ 
print. Using NFV we can launch defense VMs on the 
closest location to where they are currently needed, and 
using SDN we can flexibly route traffic through them. 

In summary, we observe new opportunities to build a 
flexible and elastic DDoS defense mechanism via SD¬ 
N/NFV. In the next section, we highlight the challenges 
in realizing these benefits. 

3 System Overview 

In this section, we envision the deployment model and 
workflow of Bohatei, highlight the challenges in realiz¬ 
ing our vision, and outline our key ideas to address these 
challenges. 

3.1 Problem scope 

Deployment scenario: For concreteness, we focus on 
an ISP-centric deployment model, where an ISP offers 
DDoS-defense-as-a-service to its customers. Note that 
several ISPs already have such commercial offerings 
(e.g., §)■ We envision different monetization avenues. 
For example, an ISP can offer a value-added security ser¬ 
vice to its customers that can replace the customers’ in- 
house DDoS defense hardware. Alternatively, the ISP 
can allow its customers to use Bohatei as a cloudburst¬ 
ing option when the attack exceeds the customers’ on- 












^Bohotei global estimation of volume 

^^SDN controller of suspicious traffic 
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Figure 2: Bohatei system overview and workflow. 

premise hardware. While we describe our work in an ISP 
setting, our ideas are general and can be applied to other 
deployment models; e.g., CDN-based DDoS defense or 
deployments inside cloud providers mi- 

In addition to traditional backbone routers and inter¬ 
connecting links, we envision the ISP has deployed mul¬ 
tiple datacenters as shown in Figure Note that this 
is not a new requirement; ISPs already have several in- 
network datacenters and are planning additional rollouts 
in the near future | [T5][23| . Each datacenter has commod¬ 
ity hardware servers and can run standard virtualized net¬ 
work functions p31 . 

Threat model: We focus on a general DDoS threat 
against the victim, who is a customer of the ISP. The 
adversary’s aim is to exhaust the network bandwidth of 
the victim. The adversary can flexibly choose from a set 
of candidate attacks AttackSet = {Aa}a- As a concrete 
starting point, we consider the following types of DDoS 
attacks: TCP SYN flood, UDP flood, DNS amplification, 
and elephant flow. We assume the adversary controls a 
large number of hots, but the total budget in terms of the 
maximum volume of attack traffic it can launch at any 
given time is fixed. Given the budget, the adversary has 
a complete control over the choice of (1) type and mix 
of attacks from the AttackSet (e.g., 60% SYN and 40% 
DNS) and (2) the set of ISP ingress locations at which 
the attack traffic enters the ISP. For instance, a simple ad¬ 
versary may launch a single fixed attack Aa arriving at a 
single ingress, while an advanced adversary may choose 
a mix of various attack types and multiple ingresses. For 
clarity, we restrict our presentation to focus on a single 
customer noting that it is straightforward to extend our 
design to support multiple customers. 

Defenses: We assume the ISP has a pre-defined library 
of defenses specifying a defense strategy for each attack 
type. For each attack type Aa, the defense strategy is 
specified as a directed acyclic graph DAGa representing a 
typical multi-stage attack analysis and mitigation proce¬ 
dure. Each node of the graph represents a logical module 
and the edges are tagged with the result of the previous 



Figure 3: A sample defense against UDP flood. 

nodes processing (e.g., “benign” or “attack” or “analyze 
further”). Each logical node will be realized by one (or 
more) virtual appliance(s) depending on the attack vol¬ 
ume. Figure shows an example strategy graph with 4 
modules used for defending against a UDP flood attack. 
Here, the first module tracks the number of UDP pack¬ 
ets each source sends and performs a simple threshold- 
based check to decide whether the source needs to be let 
through or throttled. 

Our goal here is not to develop new defense algorithms 
but to develop the system orchestration capabilities to en¬ 
able flexible and elastic defense. As such, we assume the 
DAGs have been provided by domain experts, DDoS de¬ 
fense vendors, or by consulting best practices. 

3.2 Bohatei workflow and challenges 

The workflow of Bohatei has four steps (see Figure]^: 

1. Attack detection: We assume the ISP uses some out- 
of-band anomaly detection technique to flag whether 
a customer is under a DDoS attack | |Z7| . The de¬ 
sign of this detection algorithm is outside the scope 
of this paper. The detection algorithm gives a coarse¬ 
grained specification of the suspicious traffic, indi¬ 
cating the customer under attack and some coarse 
identifications of the type and sources of the attack; 
e.g., “srcprefix=*,dstprefix=cust,type=SYN”. 

2. Attack estimation: Once suspicious traffic is de¬ 
tected, the strategy module estimates the volume of 
suspicious traffic of each attack type arriving at each 
ingress. 

3. Resource management: The resource manager then 
uses these estimates as well as the library of defenses 
to determine the type, number, and the location of 
defense VMs that need to be instantiated. The goal of 
the resource manager is to efficiently assign available 
network resources to the defense while minimizing 
user-perceived latency and network congestion. 

4. Network orchestration: Finally, the network orches¬ 
tration module sets up the required network forward¬ 
ing rules to steer suspicious traffic to the defense 
VMs as mandated by the resource manager. 

Given this workflow, we highlight the three challenges 
we need to address to realize our vision: 

Cl. Responsive resource management: We need an 
efficient way of assigning the ISP’s available compute 
and network resources to DDoS defense. Specifically, 
we need to decide how many VMs of each type to run 






































on each server of each datacenter location so that attack 
traffic is handled properly while minimizing the latency 
experienced by legitimate traffic. Doing so in a respon¬ 
sive manner (e.g., within tens of seconds), however, is 
challenging. Specifically, this entails solving a large NP- 
hard optimization problem, which can take several hours 
to solve even with state-of-the-art solvers. 


C2. Scalable network orchestration: The canonical 
view in SDN is to set up switch forwarding rules in a 
per-flow and reactive manner pO) . That is, every time 
a switch receives a flow for which it does not have a 
forwarding entry, the switch queries the SDN controller 
to get the forwarding rule. Unfortunately, this per-flow 
and reactive paradigm is fundamentally unsuitable for 
DDoS defense. First, an adversary can easily saturate the 
control plane bandwidth as well as the controller com¬ 
pute resources | |54) . Second, installing per-flow rules on 
the switches will quickly exhaust the limited rule space 
(r:! 4K TCAM rules). Note that unlike traffic engineering 
applications of SDN 1341, coarse-grained IP prefix-based 
forwarding policies would not suffice in the context of 
DDoS defense, as we cannot predict the IP prefixes of 
future attack traffic. 


C3. Dynamic adversaries: Consider a dynamic ad¬ 
versary who can rapidly change the attack mix (i.e., at¬ 
tack type, volume, and ingress point). This behavior can 
make the ISP choose between two undesirable choices: 
(1) wasting compute resources by overprovisioning for 
attack scenarios that may not ever arrive, (2) not instan¬ 
tiating the required defenses (to save resources), which 
will let attack traffic reach the customer. 


3.3 High-level approach 

Next we highlight our key ideas to address C1-C3: 

• Hierarchical optimization decomposition (Q: To 

address Cl, we use a hierarchical decomposition of 
the resource optimization problem into two stages. 
First, the Bohatei global (i.e., ISP-wide) controller 
uses coarse-grained information (e.g., total spare ca¬ 
pacity of each datacenter) to determine how many 
and what types of VMs to run in each datacen¬ 
ter. Then, each local (i.e., per-datacenter) controller 
uses more fine-grained information (e.g., location of 
available servers) to determine the specific server on 
which each defense VM will run. 

• Proactive tag-based forwarding (^: To address 
C2, we design a scalable orchestration mechanism 
using two key ideas. First, switch forwarding rules 
are based on per-VM tags rather than per-flow to dra¬ 
matically reduce the size of the forwarding tables. 
Second, we proactively configure the switches to 
eliminate frequent interactions between the switches 
and the control plane p0| . 
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Figure 4: An illustration of strategy vs. annotated vs. 
physical graphs. Given annotated graphs and suspi¬ 
cious traffic volumes, the resource manager computes 
physical graphs. 


• Online adaptation (^: To handle a dynamic adver¬ 
sary that changes the attack mix (C3), we design a de¬ 
fense strategy adaptation approach inspired by clas¬ 
sical online algorithms for regret minimization p^ . 

4 Resource Manager 

The goal of the resource management module is to effi¬ 
ciently determine network and compute resources to ana¬ 
lyze and take action on suspicious traffic. The key here is 
responsiveness—a slow algorithm enables adversaries to 
nullify the defense by rapidly changing their attack char¬ 
acteristics. In this section, we describe the optimization 
problem that Bohatei needs to solve and then present a 
scalable heuristic that achieves near optimal results. 


4.1 Problem inputs 


Before we describe the resource management problem, 
we establish the main input parameters: the ISP’s com¬ 
pute and network parameters and the defense processing 
requirements of traffic of different attack types. We con¬ 
sider an ISP composed of a set of edge PoP^U = {Ee}e 
and a set of datacenters D = {D^}^. 


ISP constraints: Each datacenter’s traffic processing 
capacity is determined by a pre-provisioned uplink ca¬ 
pacity C^'"^ and compute capacity xhe com¬ 

pute capacity is specified in terms of the number of VM 
slots, where each VM slot has a given capacity specifica¬ 
tion (e.g., instance sizes in EC2 Q). 


Processing requirements: As discussed earlier in (3.1 


different attacks require different strategy graphs. How¬ 
ever, the notion of a strategy graph by itself will not suf- 


^We use the terms “edge PoP” and “ingress” interchangeably. 



































fice for resource management, as it is does not specify 
the traffic volume that at each module should process. 

The input to the resource manager is in form of an¬ 
notated graphs as shown in Figure An annotated 
graph is a strategy graph annotated with 

edge weights, where each weight represents the fraction 
of the total input traffic to the graph that is expected to 
traverse the corresponding edge. These weights are pre¬ 
computed based on prior network monitoring data (e.g., 
using NetFlow) and from our adaptation module (ij^. 
le a denotes the volume of suspicious traffic of type a 
arriving at edge PoP e. For example, in Figure]^ weight 
0.48 from node A 2 to node R 2 means 48% of the total in¬ 
put traffic to the graph (i.e., to Ai) is expected to traverse 
edge A 2 R 2 - 

Since modules may vary in terms of compute com¬ 
plexity and the traffic rate that can be handled per VM- 
slot, we need to account for the parameter Paj that is 
the traffic processing capacity of a VM (e.g., in terms of 
compute requirements) for the logical module Va/, where 
Va,i is node i of graph DAGa"'"’“''^^. 

Network footprint: We denote the network-level cost 
of transferring the unit of traffic from ingress e to data¬ 
center d by Le^d', e.g., this can represent the path latency 
per byte of traffic. Similarly, within a datacenter, the 
units of intra-rack and inter-rack traffic costs are denoted 
by IntraUnitCost and InterUnitCost, respectively (e.g., 
they may represent latency such that IntraUnitCost < 
InterUnitCost). 

4.2 Problem statement 

Our resource management problem is to translate the an¬ 
notated graph into a physical graph (see Figure]^; i.e., 
each node i of the annotated graph will be 

realized by one or more VMs each of which implement 
the logical module Va,,- 

Fine-grained scaling: To generate physical graphs 
given annotated graphs in a resource-efficient manner, 
we adopt a fine-grained scaling approach, where each 
logical module is scaled independently. We illustrate this 
idea in Figure Figure shows an annotated graph 
with three logical modules A, B, and C, receiving differ¬ 
ent amounts of traffic and consuming different amounts 
of compute resources. Once implemented as a physical 
graph, suppose module C becomes the bottleneck due to 
its processing capacity and input traffic volume. Using 
a monolithic approach (e.g., running A, B, and C within 
a single VM), we will need to scale the entire graph as 
shown in Figure Instead, we decouple the modules 
to enable scaling out individual VMs; this yields higher 
resource efficiency as shown in Figure [5^ 

Goals: Our objective here is to (a) instantiate the VMs 
across the compute servers throughout the ISP, and (b) 
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Figure 5: An illustration of fine-grained elastic scal¬ 
ing when module C becomes the bottleneck. 


distribute the processing load across these servers to min¬ 
imize the expected latency for legitimate traffic. Further, 
we want to achieve (a) and (b) while minimizing the foot¬ 
print of suspicious traffic]^ 

To this end, we need to assign values to two key sets 
of decision variables: (1) the fraction of traffic T^ a to 
send to each datacenter Dd (denoted by fe^a,d), and (2) 
the number of VMs of type Va.i to run on server s of dat¬ 
acenter Dd- Naturally, these decisions must respect the 
datacenters’ bandwidth and compute constraints. 

Theoretically, we can formulate this resource manage¬ 
ment problem as a constrained optimization via an In¬ 
teger Linear Program (ILP). For completeness, we de¬ 
scribe the full ILP in Appendix Solving the ILP for¬ 
mulation gives an optimal solution to the resource man¬ 
agement problem. However, if the ILP-based solution is 
incorporated into Bohatei, an adversary can easily over¬ 
whelm the system. This is because the ILP approach 
takes several hours (see Table]^. By the time it computes 
a solution, the adversary may have radically changed the 
attack mix. 

4.3 Hierarchical decomposition 

To solve the resource management problem, we decom¬ 
pose the optimization problem into two subproblems: (1) 
the Bohatei global controller solves a Datacenter Selec¬ 
tion Problem (DSP) to choose datacenters responsible for 
processing suspicious traffic, and (2) given the solution 
to the DSP, each local controller solves a Server Selec¬ 
tion Problem (SSP) to assign servers inside each selected 
datacenter to run the required VMs. This decomposition 
is naturally scalable as the individual SSP problems can 
be solved independently by datacenter controllers. Next, 
we describe practical greedy heuristics for the DSP and 
SSP problems that yield close-to-optimal solutions (see 
Table|g. 

Datacenter selection problem (DSP): We design a 
greedy algorithm to solve DSP with the goal of reduc¬ 
ing ISP-wide suspicious traffic footprint. To this end, 
the algorithm first sorts suspicious traffic volumes (i.e., 

®While it is possible to explicitly minimize network conges¬ 
tion |33| , minimizing suspicious traffic footprint naturally helps reduce 
network congestion as well. 





















Te^a values) in a decreasing order. Then, for each sus¬ 
picious traffic volume Tg^a from the sorted list, the algo¬ 
rithm tries to assign the traffic volume to the datacenter 
with the least cost based on Lgj values. The algorithm 
has two outputs: {\)fg^a,d values denoting what fraction 
of suspicious traffic from each ingress should be steered 
to each datacenter (as we will see in ^ these values will 
be used by network orchestration to steer traffic corre¬ 
spondingly), (2) the physical graph corresponding to at¬ 
tack type a to be deployed by datacenter d. For complete¬ 
ness, we show the pseudocode for the DSP algorithm in 
Figure [T^in Appendix [B| 

Server selection problem (SSP): Intuitively, the SSP 
algorithm attempts to preserve traffic locality by instan¬ 
tiating nodes adjacent in the physical graph as close as 
possible within the datacenter. Specifically, given the 
physical graph, the SSP algorithm greedily tries to assign 
nodes with higher capacities (based on Pa^ values) along 
with its predecessors to the same server, or the same rack. 
For completeness we show the pseudocode for the SSP 
algorithm in Figure [TT] in Appendix [B] 

5 Network Orchestration 


Given the outputs of the resource manager module (i.e., 
assignment of datacenters to incoming suspicious traf¬ 
fic and assignment of servers to defense VMs), the role 
of the network orchestration module is to configure the 
network to implement these decisions. This includes set¬ 
ting up forwarding rules in the ISP backbone and inside 
the datacenters. The main requirement is scalability in 
the presence of attack traffic. In this section, we present 
our tag-based and proactive forwarding approach to ad¬ 
dress the limitations of the per-flow and reactive SDN 
approach. 


5.1 High-level idea 


As discussed earlier in §3.2| the canonical SDN view of 
setting up switch forwarding rules in a per-flow and re¬ 
active manner is not suitable in the presence of DDoS 
attacks. Furthermore, there are practical scalability and 
deployability concerns with using SDN in ISP back¬ 
bones I 2T|[29| . There are two main ideas in our approach 
to address these limitations: 


• Following the hierarchical decomposition in re¬ 
source management, we also decompose the net¬ 
work orchestration problem into two-sub-problems: 
(1) Wide-area routing to get traffic to datacenters, 
and (2) Intra-datacenter routing to get traffic to the 
right VM instances. This decomposition allows us 
to use different network-layer techniques; e.g., SDN 
is more suitable inside the datacenter while tradi¬ 
tional MPLS-style routing is better suited for wide- 
area routing. 


• Instead of the controller reacting to each flow arrival, 
we proactively install forwarding rules before traffic 
arrives. Since we do not know the specific IP-level 
suspicious flows that will arrive in the future, we use 
logical tag-based forwarding rules with per-VM tags 
instead of per-flow rules. 

5.2 Wide-area orchestration 


The Bohatei global controller sets up forwarding rules 
on backbone routers so that traffic detected as suspicious 
is steered from edge PoPs to datacenters according to 
the resource man agement decisions specified by the/g ^ ^ 
values (see (4.3 i J 

To avoid a forklift upgrade of the ISP backbone and 
enable an immediate adoption of Bohatei, we use tra¬ 
ditional tunneling mechanisms in the backbone (e.g., 
MPLS or IP tunneling). We proactively set up static 
tunnels from each edge PoP to each datacenter. Once 
the global controller has solved the DSP problem, the 
controller configures backbone routers to split the traf¬ 
fic according to the fg^a,d values. While our design is 
not tied to any specific tunneling scheme, the widespread 
use of MPLS and IP tunneling make them natural candi¬ 
dates 


5.3 Intra-datacenter orchestration 


Inside each datacenter, the traffic needs to be steered 
through the intended sequence of VMs. There are two 
main considerations here: 

1. The next VM a packet needs to be sent to depends on 
the context of the current VM. For example, the node 
check UDP count of src in the graph shown in Fig¬ 
ure]^ may send traffic to either/orwanf to customer 
or log depending on its analysis outcome. 

2. With elastic scaling, we may instantiate several phys¬ 
ical VMs for each logical node depending on the de¬ 
mand. Conceptually, we need a “load balancer” at 
every level of our annotated graph to distribute traf¬ 
fic across different VM instances of a given logical 
node. 

Note that we can trivially address both requirements 
using a per-flow and reactive solution. Specifically, a lo¬ 
cal controller can track a packet as it traverses the phys¬ 
ical graph, obtain the relevant context information from 
each VM, and determine the next VM to route the traf¬ 
fic to. However, this approach is clearly not scalable and 
can introduce avenues for new attacks. The challenge 
here is to meet these requirements without incurring the 
overhead of this per-flow and reactive approach. 
Encoding processing context: Instead of having the 
controller track the context, our high-level idea is to en- 

’We assume the ISP uses legacy mechanisms for forwarding non- 
attack traffic and traffic to non-Bohatei customers, so these are not the 
focus of our work. 
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Figure 6: Context-dependent forwarding using tags. 
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Figure 7: Different load balancer design points. 



code the necessary context as tags inside packet head¬ 
ers ED- Consider the example shown in Figure [^com¬ 
posed of VMs Al l, A 24 , and /?i 1 . Ai 1 encodes the pro¬ 
cessing context of outgoing traffic as tag values embed¬ 
ded in its outgoing packets (i.e., tag values 1 and 2 denote 
benign and attack traffic, respectively). The switch then 
uses this tag value to forward each packet to the correct 
next VM. 

Tag-based forwarding addresses the control channel 
bottleneck and switch rule explosion. First, the tag gen¬ 
eration and tag-based forwarding behavior of each VM 
and switch is configured proactively once the local con¬ 
troller has solved the SSP. We proactively assign a tag 
for each VM and populate forwarding rules before flows 
arrive; e.g., in Figure]^ the tag table of Ai 1 and the for¬ 
warding table of the router have been already populated 
as shown. Second, this reduces router forwarding rules 
as illustrated in Figure [^ Without tagging, there will be 
one rule for each of the 1000 flows. Using tag-based for¬ 
warding, we achieve the same forwarding behavior using 
only two forwarding rules. 


Scale-out load balancing: One could interconnect VMs 
of the same physical graph as shown in Figure 7a us¬ 


ing a dedicated load balancer (load balancer). Flowever, 
such a load balancer may itself become a bottleneck, as 
it is on the path of every packet from any VM in the set 
{Ai i,Ai 2 } to any VM in the set ^^ 1 , 3 }- To 

circumvent this problem, we implement the distribution 
strategy inside each VM so that the load balancer capa¬ 
bility scales proportional to the current number of VMs. 
Consider the example shown in Figure [^ where due to 
an increase in attack traffic volume we have added one 
more VM of type Ai (denoted by A 1 . 2 ) and one more 
VM of type Ri (denoted by R\. 2 )- To load balance traffic 
between the two VMs of type R\, the load balancer of 
Ai VMs (shown as LB\ i and LB\ 2 in the figure) pick a 
tag value from a tag pool (shown by {2,3} in the figure) 
based on the processing context of the outgoing packet 
and the intended load balancing scheme (e.g., uniformly 
at random to distribute load equally). Note that this tag 
pool is pre-populated by the local controller (given the 
defense library and the output of the resource manager 


module). This scheme, thus, satisfies the load balancing 
requirement in a scalable manner. 

Other issues: There are two remaining practical issues: 

• Number of tag bits: We give a simple upper bound on 

the required number of bits to encode tags. First, to 
support context-dependent forwarding out of a VM 
with k relevant contexts, we need k distinct tag val¬ 
ues. Second, to support load balancing among / VMs 
of the same logical type, each VM needs to be popu¬ 
lated with a tag pool including I tags. Thus, at each 
VM we need at most k x / distinct tag values. There¬ 
fore, an upper bound on the total number of unique 
tag values is k^ax x x | |, where k^ax 

a 

and Imax are the maximum number of contexts and 
VMs of the same type in a graph, and is 

the set of vertices of annotated graph for attack type 
a. To make this concrete, across the evaluation ex¬ 
periments the maximum value required tags was 
800, that can be encoded in {(og 2 ( 800 )] = 10 bits. 
In practice, this tag space requirement of Bohatei 
can be easily satisfied given that datacenter grade 
networking platforms already have extensible header 
fields 

• Bidirectional processing: Some logical modules may 
have bidirectional semantics. For example, in case 
of a DNS amplification attack, request and response 
traffic must be processed by the same VM. (In other 
cases, such as the UDP flood attack, bidirectional¬ 
ity is not required.). To enforce bidirectionality, ISP 
edge switches use tag values of outgoing traffic so 
that when the corresponding incoming traffic comes 
back, edge switches sends it to the datacenter within 
which the VM that processed the outgoing traffic is 
located. Within the datacenter, using this tag value, 
the traffic is steered to the VM. 

6 Strategy Layer 

As we saw in ^ a key input to the resource manager 
module is the set of 7), ^ values, which represents the vol¬ 
ume of suspicious traffic of each attack type a arriving at 
each edge PoP e. This means we need to estimate the fu- 































































ture attack mix based on observed measurements of the 
network and then instantiate the required defenses. We 
begin by describing an adversary that intends to thwart 
a Bohatei-like system. Then, we discuss limitations of 
strawman solutions before describing our online adapta¬ 
tion mechanism. 

Interaction model: We model the interaction between 
the ISP running Bohatei and the adversary as a repeated 
interaction over several epochs. The ISP’s “move” is 
one epoch behind the adversary; i.e., it takes Bohatei an 
epoch to react to a new attack scenario due to implemen¬ 
tation delays in Bohatei operations. The epoch duration 
is simply the sum of the time to detect the attack, run the 
resource manager, and execute the network orchestration 
logic. While we can engineer the system to minimize this 
lag, there will still be non-zero delays in practice and thus 
we need an adaptation strategy. 

Objectives; Given this interaction model, the ISP has to 
pre-allocate VMs and hardware resources for a specific 
attack mix. An intelligent and dynamic adversary can 
change its attack mix to meet two goals: 

G1 Increase hardware resource consumption: The ad¬ 
versary can cause ISP to overprovision defense VMs. 
This may impact the ISP’s ability to accommodate 
other attack types or reduce profits from other ser¬ 
vices that could have used the infrastructure. 

G2 Succeed in delivering attack traffic: If the ISP’s de¬ 
tection and estimation logic is sub-optimal and does 
not have the required defenses installed, then the ad¬ 
versary can maximize the volume of attack traffic de¬ 
livered to the target. 

The adversary’s goal is to maximize these objectives, 
while the ISPs goal is to minimize these to the extent pos¬ 
sible. One could also consider a third objective of collat¬ 
eral damage on legitimate traffic; e.g., introduce need¬ 
less delays. We do not discuss this dimension because 
our optimization algorithm from [Q will naturally push 
the defense as close to the ISP edge (i.e., traffic ingress 
points) as possible to minimize the impact on legitimate 
traffic. 

Threat model: We consider an adversary with a fixed 
budget in terms of the total volume of attack traffic it can 
launch at any given time. Note that the adversary can 
apportion this budget across the types of attacks and the 
ingress locations from which the attacks are launched. 
Formally, we have Y.lLTe,a < B, but there are no con- 

e a 

straints on the specific Tg^a values. 

Limitations of strawman solutions: For simplicity, let 
us consider a single ingress point. Let us consider a 
strawman solution called PrevEpoch where we measure 
the attack observed in the previous epoch and use it as the 
estimate for the next epoch. Unfortunately, this can have 


serious issues w.r.t. goals G1 and G2. To see why, con¬ 
sider a simple scenario where we have two attack types 
with a budget of 30 units and three epochs with the attack 
volumes as follows; Tl; Al= 10, A2=0; T2: Al=20, 
A2=0; T3: A1=0; A2=30. Now consider the PrevEpoch 
strategy starting at the 0,0 configuration. It has a total 
wastage of 0,0,20 units and a total evasion of 10,10,30 
units because it has overfit to the previous measurement. 
We can also consider other strategies; e.g., a Uniform 
strategy that provisions 15 units each for A1 and A2 or 
extensions of these to overprovision where we multiply 
the number of VMs given by the resource manager in the 
last epoch by a fixed value 7 > 1. However, these suffer 
from the same problems and are not competitive. 


Online adaptation: Our metric of success here is to 
have low regret measured with respect to the best static 
solution computed in hindsight p 6 | . Note that in gen¬ 
eral, it is not possible to be competitive w.r.t. the best 
dynamic solution since that presumes oracle knowledge 
of the adversary, which is not practical. 

Intuitively, if we have a non-adaptive adversary, using 
the observed empirical average is the best possible static 

y. Y 

hindsight estimation strategy; i.e., T*^— would 

be the optimal solution (|f| denotes the total number of 
epochs). However, an attacker who knows that we are 
using this strategy can game the system by changing the 
attack mix. To address this, we use a follow the per¬ 
turbed leader (FPL) strategy 1361 where our estimation 
uses a combination of the past observed behavior of the 
adversary and a randomized component. Intuitively, the 
random component makes it impossible for the attacker 
to predict the ISP’s estimates. This is a well-known ap¬ 
proach in online algorithms to minimize the regret . 
Specifically, the traffic estimates for the next epoch f -f 1, 
denoted by Tg a,t+i values, are calculated based on the 
average of the past values plus a random component: 

Te,a,t+\ = ' + fondperturb. 

Here, 7),^,/ is the empirically observed value of the 
attack traffic and randperturb is a random value drawn 
uniformly from [ 0 , „g,,Eplhx\E\x\A \]- (This is assuming a 
total defense of budget of 2 x B.) It can be shown that 
this is indeed a provably good regret minimization strat¬ 
egy ; we do not show the proof for brevity. 


7 Implementation 


In this section, we briefly describe how we implemented 
the key functions described in the previous sections. We 
have made the source code available |(T]. 

7.1 DDoS defense modules 


The design of the Bohatei strategy layer is inspired by 
the prior modular efforts in Click 0 and Bro ph] . This 
modularity has two advantages. First, it allows us to 






adopt best of breed solutions and compose them for dif¬ 
ferent attacks. Second, it enables more fine-grained scal¬ 
ing. At a high level, there are two types of logical build¬ 
ing blocks in our defense library: 

1. Analysis (A): Each analysis module processes a sus¬ 
picious flow and determines appropriate action (e.g., 
more analysis or specific response). It receives a 
packet and outputs a tagged packet, and the tags are 
used to steer traffic to subsequent analysis and re¬ 
sponse module instances as discussed earlier. 

2. Response (R): The input to an R module is a tagged 
packet from some A module. Typical responses in¬ 
clude forward to customer (for benign traffic), log, 
drop, and rate limit. Response functions will depend 
on the type of attack; e.g., sending RST packets in 
case of a TCP SYN attack. 

Next, we describe defenses we have implemented for 
different DDoS attacks. Our goal here is to illustrate the 
flexibility Bohatei provides in dealing with a diverse set 
of known attacks rather than develop new defenses. 

1. SYN flood (Figure]^: We track the number of open 
TCP sessions for each source IP; if a source IP has 
no asymmetry between SYNs and ACKs, then mark 
its packets as benign. If a source IP never completes 
a connection, then we can mark its future packets as 
known attack packets. If we see a gray area where the 
source IP has completed some connections but not 
others, in which case we use a SYN-Proxy defense 
(e.g., @1^). 

2. DNS amplification (Figure]^: We check if the DNS 
server has been queried by some customer IP. This 
example highlights another advantage—we can de¬ 
couple fast (e.g., the header-based A_L/G//7’C//£’C/r 
module) and slow path analyses (e.g., the second A 
module needs to look into payloads). The responses 
are quite simple and implement logging, dropping, or 
basic forwarding to the destination. We do not show 
the code for brevity. 

3. UDP flood: The analysis node AJJDP identifies 
source IPs that send an anomalously higher num¬ 
ber of UDP packets and uses this to categorize each 
packet as either attack or benign. The function 
forward will direct the packet to the next node in the 
defense strategy; i.e., R OK if benign, or RJ^OG if 
attack. 

4. Elephant flow: Here, the attacker launches legiti¬ 
mate but very large flows. The A module detects ab¬ 
normally large flows and flags them as attack flows. 
The response is to randomly drop packets from these 
large flows (not shown). 

Attack detection: We use simple time series anomaly 
detection using nfdump, a tool that provides NetFlow- 



Figure 8: SYN Flood defense strategy graph. 



Figure 9: DNS ampliflcation defense strategy graph. 


like capabilities, and custom code gz)- The output of the 
detection module is sent to the Bohatei global controller 
as a 3-tuple {Type,FlowSpec, Volume), where Type indi¬ 
cates the type of DDoS attack (e.g., SYN flood, DNS am¬ 
plification), FlowSpec provides a generic description of 
the^ow space of suspicious traffic (involving wildcards), 
and Volume indicates the suspicious traffic volume based 
on the flow records. Note that this FlowSpec does not 
pinpoint specific attack flows; rather, it is a coarse¬ 
grained hint on characteristics of suspicious traffic that 
need further processing through the defense graphs. 

7.2 SDN/NFV platform 

Control plane: We use the OpenDayLight network 
control platform, as it has gained significant traction 
from key industry players GZl- We implemented the 
Bohatei global and local control plane modules (i.e., 
strategy, resource management, and network orchestra¬ 
tion) as separate OpenDayLight plugins. Bohatei uses 
OpenFlow pO) for configuring switches; this is purely 
for ease of prototyping, and it is easy to integrate other 
network control APIs (e.g., YANG/NetCONF). 

Data plane: Each physical node is realized using a 
VM running on KVM. We use open source tools (e.g.. 
Snort, Bro) to implement the different Analysis (A) and 





























Attack 

type 

Analysis 

Response 

UDP 

flood 

A_UDP using Snort (inline 
mode) 

R_LOG using iptables and 
R_RATELIMIT using tc li¬ 
brary 

DNS 

amp. 

both LIGHTCHECK and 
MATCHRQST using net- 
filter library, iptables, cus¬ 
tom code 

R_LOG and R_DROP us¬ 
ing iptables 

SYN 

flood 

A.SYNFLOOD using Bro 

R_SYNPROXY using 

PF firewall, R_LOG and 

R DROP using iptables 

Elephant 

flow 

A_ELEPHANT using net- 
filter library, iptables, cus¬ 
tom code 

R_DROP using iptables 


Table 1: Implementation of Bohatei modules. 


Response (R) modules. Table summarizes the specific 
platforms we have used. These tools are instrumented us¬ 
ing FlowTags HD to add tags to outgoing packets to pro¬ 
vide contextual information. We used OpenvSwitch HD 
to emulate switches in both datacenters and ISP back¬ 
bone. The choice of OpenvSwitch is for ease of proto¬ 
typing on our testbed. 

Resource management algorithms: We implement the 
DSP and SSP algorithms using custom Go code. 

8 Evaluation 


In this section, we show that; 

1. Bohatei is scalable and handles attacks of hundreds 
of Gbps in large ISPs and that our design decisions 
are crucial for its scale and responsiveness ([ 8.1 1 

2. Bohatei enables a rapid (< 1 minute) response for 
several canonical DDoS attack scenarios ( §8.2| i 

3. Bohatei can successfully cope with several dynamic 
attack strategies (|8.3|l 


Setup and methodology: We use a combination of real 
testbed and trace-driven evaluations to demonstrate the 
above benefits. Here we briefly describe our testbed, 
topologies, and attack configurations: 

• SDN Testbed: Our testbed has 13 Dell R720 ma¬ 
chines (20-core 2.8 GHz Xeon CPUs, 128GB RAM). 
Each machine runs KVM on CentOS 6.5 (Linux ker¬ 
nel v2.6.32). On each machine, we assigned equal 
amount of resources to each VM: 1 vCPU (virtual 
CPU) and 512MB of memory. 


• Network topologies: We emulate several router- 
level ISP topologies (6-196 nodes) from the Internet 
Topology Zoo p2) . We set the bandwidth of each 
core link to be lOOGbps and link latency to be 10ms. 
The number of datacenters, which are located ran¬ 
domly, is 5% of the number of backbone switches 
with a capacity of 4,000 VMs per datacenter. 


• Benign traffic demands: We assume a gravity model 
of traffic demands between ingress-egress switch 


Topology 

#Nodes 

Run time (secs) 

Optimality 



Baseline 

Bohatei 

Gap 

Heanet 

6 

205 

0.002 

0.0003 

OTEGlobe 

92 

2234 

0.007 

0.0004 

Cogent 

196 

> 1 hr 

0.01 

0.0005 


Table 2: Run time and optimality gap of Bohatei vs. 
ILP formulation across different topologies. 



Figure 10: Bohatei control plane scalability. 


pairs m- The total volume is scaled linearly with 
the size of the network such that the average link load 
on the topology backbone is 24Gbps with a maxi¬ 
mum bottleneck link load of 55Gbps. We use iper f 
and custom code to generate benign traffic. 

• Attack traffic: We implemented custom modules to 
generate attack traffic: (1) SYN flood attack by send¬ 
ing only SYN packets with spoofed IP addresses at 
a high rate; (2) DNS amplification using OpenDNS 
server with BIND (version 9.8) and emulating an at¬ 
tacker sending DNS requests with spoofed source 
IPs; (3) We use iperf to create some fixed band¬ 
width traffic to generate elephant flows, and (4) UDP 
flood attacks. We randomly pick one edge PoP as 
the target and vary the target across runs. We ramp 
up the attack volume until it induces maximum re¬ 
duction in throughput of benign flows to the target. 
On our testbed, we can ramp up the volume up to 
10 Gbps. For larger attacks, we use simulations. 

8.1 Bohatei scalability 


Resource management: Table [^compares the run time 
and optimality of the ILP-based algorithm and Bohatei 
(i.e., DSP and SSP) for 3 ISP topologies of various sizes. 
(We have results for several other topologies but do not 
show it for brevity.) The ILP approach takes from sev¬ 
eral tens of minutes to hours, whereas Bohatei takes only 
a few milliseconds enabling rapid response to changing 
traffic patterns. The optimality gap is < 0.04%. 


Control plane responsiveness: Figure 10 shows the 
per-flow setup latency comparing Bohatei to the SDN 
per-flow and reactive paradigm as the number of attack 
flows in a DNS amplification attack increases. (The re¬ 
sults are consistent for other types of attacks and are not 
shown for brevity.) In both cases, we have a dedicated 
machine for the controller with 8 2.8GHz cores and 64 
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Figure 11: Number of switch forwarding rules in Bo¬ 
hatei vs. today’s flow-based forwarding. 


GB RAM. To put the number of flows in context, 200K 
flows roughly corresponds to a 1 Gbps attack. Note that a 
typical upper bound for switch flow set-up time is on the 
order of a few milliseconds | |59l . We see that Bohatei in¬ 
curs zero rule setup latency, while the reactive approach 
deteriorates rapidly as the attack volume increases. 


Number of forwarding rules: Figure 11 shows the 
maximum number of rules required on a switch across 
different topologies for the SYN flood attack. Using to¬ 
day’s flow-based forwarding, each new flow will require 
a rule. Using tag-based forwarding, the number of rules 
depends on the number of VM instances, which reduces 
the switch rale space by four orders of magnitude. For 
other attack types, we observed consistent results (not 
shown). To put this in context, the typical capacity of an 
SDN switch is 3K-4K rules (shared across various net¬ 
work management tasks). This means that per-flow rules 
will not suffice for attacks beyond lOGbps. In contrast, 
Bohatei can handle hundreds of Gbps of attack traffic; 
e.g., a 1 Tbps attack will require < IK rules on a switch. 


Benefit of scale-out load balancing: We measured the 
resources that would be consumed by a dedicated load 
balancing solution. Across different types of attacks with 
a fixed rate of lOGbps, we observed that a dedicated load 
balancer design requires between 220-300 VMs for load 
balancing alone. By delegating the load balancing task 
to the VMs, our design obviates the need for these extra 
load balancers (not shown). 


8.2 Bohatei end-to-end effectiveness 


We evaluated the effectiveness of Bohatei under four dif¬ 
ferent types of DDoS attacks. We launch the attack traf¬ 
fic of the corresponding type at 10th second; the attack 
is sustained for the duration of the experiment. In each 
scenario, we choose the attack volume such that it is ca¬ 
pable of bringing the throughput of the benign traffic to 
zero. Figure [T^ shows the impact of attack traffic on the 
throughput of benign traffic. The Y axis for each sce¬ 
nario shows the network-wide throughput for TCP traf¬ 
fic (a total of lOGbps if there is no attack). The results 
shown in this figure are based on Cogent, the largest 
topology with 196 switches; the results for other topolo¬ 
gies were consistent and are not shown. While we do see 


Attack type 

# VMs needed 


Monolithic 

Fine-grained scaling 

DNS Amplification 

5,422 

1,005 

SYN Flood 

3,167 

856 

Elephant flows 

1,948 

910 

UDP flood 

3,642 

1,253 


Table 3: Total hardware provisioning cost needed to 
handle a 100 Gbps attack for different attacks. 


some small differences across attacks, the overall reac¬ 
tion time is short. 



Figure 12: Bohatei enables rapid response and re¬ 
stores throughput of legitimate traffic. 


The key takeaway is that Bohatei can help networks 
respond rapidly (within one minute) to diverse attacks 
and restore the performance of legitimate flows. We re¬ 
peated the experiments with UDP as the benign traffic. 
In this case, the recovery time was even shorter, as the 
throughput does not suffer from the congestion control 
mechanism of TCP. 

Hardware cost: We measure the total number of VMs 
needed to handle a given attack volume and compare two 
cases: (1) monolithic VMs embedding the entire defense 
logic for an attack, and (2) using Bohatei’s fine-grained 
modular scaling. Table shows the number of VMs 
required to handle different types of 100 Gbps attacks. 
Fine-grained scaling gives a 2.1-5.4x reduction in hard¬ 
ware cost vs. monolithic VMs. Assuming a commodity 
server costs $3,000 and can run 40VMs in Bohatei (as 
we did), we see that it takes a total hardware cost of less 
than about $32,000 to handle a 100 Gbps attack across 
Table |3] This is in contrast to the total server cost of 
about $160,000 for the same scenario if we use mono¬ 
lithic VMs. Moreover, since Bohatei is horizontally scal¬ 
able by construction, dealing with larger attacks simply 
entails a linearly scale up of the number of VMs. 

Routing efficiency: To quantify how Bohatei addresses 
the routing inefficiency of existing solutions ( §2.2[ ), we 
ran the following experiment. For each topology, we 
measured the end-to-end latency in two equivalently pro¬ 
visioned scenarios: (1) the location of the DDoS de¬ 
fense appliance is the node with the highest between¬ 
ness valu^ and (2) Bohatei. As a baseline, we consider 

^Betweenness is a measure of a node’s centrality, which is the frac¬ 
tion of the network’s all-pairs shortest paths that pass through that node. 















Figure 13: Routing efficiency in Bohatei. 




Figure 14: Effect of different adaptation strategies 
(bars) vs. different attacker strategies (X axis). 


shortest path routing without attacks. The main conclu¬ 
sion in Figure 13 is that Bohatei reduces traffic latency 
by 20% to 65% across different scenarios. 


8.3 Dynamic DDoS attacks 

We consider the following dynamic DDoS attack strate¬ 
gies; (1) Randingress: In each epoch, pick a random 
subset of attack ingresses and distribute the attack bud¬ 
get evenly across attack types; (2) RandAttack: In each 
epoch, pick a random subset of attack types and dis¬ 
tribute the budget evenly across all ingresses; (3) Rand- 
Hybrid: In each epoch, pick a random subset of ingresses 
and attack types independently and distribute the attack 
budget evenly across selected pairs; (4) Steady: The ad¬ 
versary picks a random attack type and a subset of in¬ 
gresses and sustains it during all epochs; and (5) Flip- 
PrevEpoch: This is conceptually equivalent to conduct¬ 
ing two Steady attacks A 1 and A2 with each being active 
during odd and even epochs, respectively. 

Given the typical DDoS attack duration (« 6 
hours USX we consider an attack lasting for 5000 5- 
second epochs (i.e., hours). Bohatei is initialized 
with a zero starting point of attack estimates. The met¬ 


ric of interest we report is the normalized regret with re¬ 
spect to the best static decision in hindsight; i.e., if we 
had to pick a single static strategy for the entire duration. 
Figure [44^ and Figure [T4b| show the regret w.r.t. the two 
goals G1 (the number of VMs) and G2 (volume of suc¬ 
cessful attack) for a 24-node topology. The results are 
similar using other topologies and are not shown here. 
Overall, Bohatei’s online adaptation achieves low regret 
across the adversarial strategies compared to two straw- 
man solutions: (1) uniform estimates, and (2) estimates 
given the previous measurements. 

9 Related Work 


DDoS has a long history; we refer readers to surveys for a 
taxonomy of DDoS attacks and defenses (e.g., pT]). We 
have already discussed relevant SDN/NFV work in the 
previous sections. Here, we briefly review other related 
topics. 

Attack detection: There are several algorithms for de¬ 
tecting and hltering DDoS attacks. These include time 
series detection techniques (e.g., p7)), use of backscat- 
ter analysis (e.g., ED), exploiting attack-specihc fea¬ 
tures (e.g., H)), and network-wide analysis (e.g., ID)- 
These are orthogonal to the focus of this paper. 


DDoS-resilient Internet architectures: These include 


the use of capabilities (581, better inter-domain routing 
(e.g., ||M[), inter-AS collaboration (e.g., |[39)), packet 
marking and unforgeable identihers (e.g., |26)), and 
traceback (e.g., 15T)). However, they do not provide an 
immediate deployment path or resolution for current net¬ 
works. In contrast, Bohatei focuses on a more practical, 
single-ISP context, and is aligned with economic incen¬ 
tives for ISPs and their customers. 


Overlay-based solutions: There are overlay-based so¬ 
lutions (e.g., | 25][52) ) that act as a “buffer zone” between 
attack sources and targets. The design contributions in 
Bohatei can be applied to these as well. 


SDN/NFV-based security: There are few efforts in 
this space such as FRESCO | [5^ and AvantGuard | [54) . 
As we saw earlier, these SDN solutions will introduce 
new DDoS avenues because of the per-flow and reac¬ 
tive model 0. Solving this control bottleneck requires 
hardware modihcations to SDN switches to add “state¬ 
ful” components, which is unlikely to be supported by 
switch vendors soon | |54l . In contrast, Bohatei chooses 
a proactive approach of setting up tag-based forwarding 
rules that is immune to these pitfalls. 


10 Conclusions 


Bohatei brings the flexibility and elasticity benehts of 
recent networking trends, such as SDN and NFV, to 
DDoS defense. We addressed practical challenges in 
the design of Bohatei’s resource management algorithms 
























































and control/data plane mechanisms to ensure that these 
do not become bottlenecks for DDoS defense. We 
implemented a full-featured Bohatei prototype built on 
industry-standard SDN control platforms and commod¬ 
ity network appliances. Our evaluations on a real testbed 
show that Bohatei (1) is scalable and responds rapidly to 
attacks, (2) outperforms naive SDN implementations that 
do not address the control/data plane bottlenecks, and (3) 
enables resilient defenses against dynamic adversaries. 
Looking forward, we believe that these design principles 
can also be applied to other aspects of network security. 
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A ILP Formulation 

The ILP formulation for an optimal resource manage¬ 
ment (mentioned in ( ]4.2| i is shown in Figure [TS] 

Vairables: In addition to the parameters and variables 
that we have defined earlier in Q we define the binary 
variable qd,a,i,vm,s/,vm',s',i follows: if it is 1, VM vm of 


1 Minimize axY^LLfe,a,d x Te,a x + L dsc^ 

e a d d 

S.t. 

2 Ve,a : YJe,a,d = 1 > all suspicious traffic should be served 

d 

3 ya,d : ta^d = llfe,a.d X Te^a t> traffic of each type to each datacenter 

e 

4 Vd : Y,ta.d < Crf"* I> datacenter link capacity 


5 W,a,i: Y, n^. >tajx — 


S€:Sd 


PaJ 


> provisioning sufficient VMs (5^ is the set of d's servers.) 


6 yd,seS,:YL4'j<C^d7‘’‘‘“' t> server compute capacity 


7 Vd : dscj = intraR^ x IntraUnitCost + interR,^ x InterUnitCost t> total cost within each datacenter 

MaxVM MaxVM MaxVol 

8 Vd: intmRd = Y L L LEE qd,a,i,vm,s,i',vm',s'J > intra-rack cost 

{s,s')£sameRack vm=\ vm'=l /=! 

MaxVM MaxVM MaxVol 

9 Vd: interRd = E E E LEE qdM,i,vm,sd',vm’,s',l > inter-rack cost 

^ {s,s')^sameRack vm=l vm'=l l=l 

MaxVM MaxVol 

10 \/d,a,i',vm'-.YY, E EE < ^a,;'> enforcing VMs capacities 

s s' i-(i vm=l l=\ 

^ ^ MaxVM MaxVM MaxVol 

11 yd,seSd,a,i^ \n^'j>xPa/> L L ILL a,/,ym,vm'y,/> bound traffic volumes 

vm=l vm'=l r(i s' l=l 


, MaxVMMaxVM 

12 'id.seSd.aJ'-n^xPa^i'< L L 


MaxVol 

^ ^ ^ Qd,a,i,vm.sJ' ,vm' ,s' 

vm=l vm' = l i-d if)—pamwtated s' l=l 


/ + 1 > bound traffic volumes 


13 > flow conservation for VM vm of type logical node k that has both predecessor(s) and successor(s) 

MaxVM MaxVol MaxVM MaxVol 

\ld^a^k^VYH . ^ ^ H Qd^a,g,vm',s'^k^vm^s.l H H H ^d^a,k^vm^s.h^vm'^s',l 

vm'=l g:(^g^Jc}=e‘"^"^f^‘^ ^ s' l=l vm'=l ^ s' l=l 

14 yiink ^ ISP backbone : L/e,a. d X Pe^a ^ ^ M. axPinkCapacity per-link traffic load control 

link£Pathg^d ^ 

15 fe^a,d ^ [H? 1 ] 5 qd,aj,vm,sd',vm\s'J ^ 1 ^a.i^^aj ^ {Hi 11 ■ • •5 ifitraRj,dsCif G M [> variables 


Figure 15: ILP formulation for an optimal resource management. 


type Vaj runs on server i and sends 1 unit of traffic (e.g., 1 
Gbps) to VM vm' of type ,/ that runs on server s\ where 
G ^annoMrerf^ Servers s and s' are located in 
datacenter d; otherwise, qd,a,i,vm,s,i',vm',s',i = 0- Here / is 
an auxiliary subscript indicating that the one unit of traf¬ 
fic associated with q is the Ith one out of MaxVol possible 
units of traffic. The maximum required number of VMs 
of any type is denoted by MaxVM. 

The ILP involves two key decision variables: (l)fe^a,d 
is the fraction of traffic T^ a to send to datacenter D^, and 

(2) rdy. is the number of VMs of type Va^i on server i of 

datacenter d, hence physical graphs . 

Objective function: The objective function (1) is 
composed of inter-datacenter and intra-datacenter costs, 
where constant a > 0 reflects the relative importance of 
inter-datacenter cost to intra datacenter cost. 

Constraints: Equation (2) ensures all suspicious traf¬ 
fic will be sent to data centers for processing. Equation 

(3) computes the amount of traffic of each attack type 
going to each datacenter, which is ensured to be within 
datacenters bandwidth capacity using (4). Equation (5) is 


intended to ensure sufficient numbers of VMs of the re¬ 
quired types in each datacenter. Servers compute capaci¬ 
ties are enforced using (6). Equation (7) sums up the cost 
associated with each datacenter, which is composed of 
two components: intra-rack cost, given by (8), and inter¬ 
rack component, given by (9). Equation (10) ensures the 
traffic processing capacity of each VM is not exceeded. 
Equations (11) and (12) tie the variables for number 
of VMs (i.e., traffic (i.e., qd^ad,vm,s,i',vm',s'J^ 

each other. Elow conservation of nodes is guaranteed 
by (13). Inequality (14) ensures no ISP backbone link 
gets congested (i.e., by getting a traffic volume of more 
than a fixed fraction j3 of its maximum capacity), while 
Fathead is a path from a precomputed set of paths from 
e to d. The ILP decision variables are shown in (15). 

B DSP and SSP Algorithms 

As described in j |4.3| due to the impractically long time 
needed to solve the ILP formulation, we design the DSP 
and SSP heuristics for resource management. The ISP 
global controller solves the DSP problem to assign sus¬ 
picious incoming traffic to data centers. Then each lo¬ 
cal controller solves an SSP problem to assign servers to 




VMs. Figure 16 and 17 show the detailed pseudocode 
for the DSP and SSP heuristics, respectively. 
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> Outputs: and f^ ^.d values 

Build max-heap of attack volumes T 

while \Empty(T'”“’‘‘^“‘P) 

do f <— ExtractMax{T"'‘‘-’‘^“'P) 

d <r- datacenter with min. L{,e.t.d and cap.> 0 
[> enforcing datacenter link capacity 
ti <^min{t,C'J"'‘) 

[> compute capacity of d for traffic type a 

^Compute 

^2 ■* iw , 


[> enforcing datacenter compute capacity 

^assigned 


fe 


e.a,d * 




for each module type i 

do > update • given new assignment 


/-'link . 
^d 


- y,d ^ 

' i ~ ^assigned 




/-•link 
^d - 


^assigned 




C HJtlijJUtc ^^Ll/lilyUtc . r-' p 

^ ^ ^assigned p ^ 

i 

[> leftover traffic 

hmassigned — ^ ^assigned 
if idunassigned ^ 0) 

then Insert { , t„„cssig„ed ) 

for each datacenter d and attack type a 

do Given n'^ - and compute DAcff""' 


Figure 16: Heuristic for datacenter selection prob¬ 
lem (DSP). 


1 > Inputs: DAG^*J'''™^, IntraUnitCost, InterUnitCost, 

j /^compute , 

and ^ values 

' d s 

2 [> Outputs: n^j values 

3 

4 while entire jg j^ot assigned to t/’s servers 

5 do •<— whose all predecessors are assigned 

6 if —WL) 

7 then N ^ with max P^,/ 

8 localizeinodi&s of corresponding to N) 

9 

10 > function localize tries to assign all of its 
input physical nodes to the same server or rack 

11 localize{inNodQs){ 

12 assign all inNodes to emptiest server 

13 if failed 

14 then assign all inNodes to emptiest rack 

15 if failed 

16 then split inNodes across racks 

17 update values 

18 } 


Figure 17: Heuristic for server selection problem 
(SSP) at datacenter d. 











